Back in the early 1990s, before there was a World Wide Web, there was the Internet Gopher. It was a distributed information system in the same sense as the web, but didn’t use hypertext and was text-based. Gopher was popular back then, as it made it easy to hop from one server to the next in a way that FTP didn’t.
Gopher has hung on over the years, and is still clinging to life in a way. Back in 2007, I was disturbed at the number of old famous Gopher servers that had disappeared off the Internet without a trace. Some of these used to be known by most users of the Internet in the early 90s. To my knowledge, no archive of this data existed. Nobody like archive.org had ever attempted to save Gopherspace.
So I decided I would. I wrote Gopherbot, a spidering archiver for Gopherspace. I ran it in June 2007, and saved off all the documents and sites it could find. That saved 40GB of data, or about 780,000 documents. Since that time, more servers have died. To my knowledge, this is the only comprehensive archive there is of what Gopherspace was like. (Another person is working on a new 2010 archive run, which I’m guessing will find some new documents but turn up fewer overall than 2007 did.)
When this was done, I compressed the archive with tar and bzip2 and split it out to 4 DVDs and mailed copies to a few people in the Gopher community.
This is 15GB compressed, and also includes a rare video interview with two of the founders of Gopher.
From the README File:
The contents of this archive are:
gopher-arch-sqldump.bz2: A dump of the SQL table used by gopherbot to spider
gopher-arch.tar.bz2: The main archive, prepared in June 2007 by
gopher-arch.tar.bz2.sha256sum: sha256sum of the above, for validation
This file expands to roughly 40GB
and containes 780238 files or directories
gopherbot.tar.gz: The software used to prepare this data, written in Haskell by John Goerzen
mpm_gopher.mov: A rare interview with some of the founders of Gopher:
Mark McCahilll and Farhad Anklesaria, also including some early screenshots.
tar-filelist.bz2: A list from tar of all the files in gopher-arch.tar.bz2